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Abstract. Computing supertrees is a central problem in phylogenetics. The supertree method 
that is by far the most widely used today was introduced in 1992 and is called Matrix Represen- 
tation with Parsimony analysis (MRP). Matrix Representation using Flipping (MRF), which was 
introduced in 2002, is an interesting variant of MRP: MRF is arguably more relevant that MRP 
and various efficient implementations of MRF have been presented. From a theoretical point of 
view, implementing MRF or MRP is solving NP-hard optimization problems. The aim of this 
paper is to study the approximability and the fixed-parameter tractability of the optimization 
problem corresponding to MRF, namely Minimum-Flip Supertree. We prove strongly negative 
results. 
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1 Introduction 



When studying the evolutionary relatedness of current taxa, the discovered relations are 
O . usually represented as rooted trees, called phytogenies. Phylogenies for various taxa sets are 

routinely inferred from various kinds of molecular and morphological data sets. A subsequent 
problem is computing supertrees [3], i.e., amalgamating phylogenies for non-identical but 
overlapping taxon sets to obtain more comprehensive phylogenies. Constructing supertrees 



£f) \ is easy if no contradictory information is contained in the data [JJ. However, incompatible 

input phylogenies are the rule rather than the exception in practice. The major problem for 
supertree methods is thus dealing with incompatibilities. 

The supertree method that is by far the most widely used today was independently pro- 
posed by Baum [3] and Ragan [26] in 1992; it is called Matrix Representation with Parsimony 
analysis (MRP) [3J. From a theoretical point of view, implementing MRP is designing an algo- 
rithm for an NP-hard optimization problem [14pi8j , so the running times of MRP algorithms 
are sometimes prohibitive for large data sets. 

In 2002, Chen et al. proposed a variant of MRP [11], which was later called Matrix Rep- 
resentation using Flipping (MRF) [9j. MRF is arguably more relevant than MRP [3J (see 
also [12^116] ) . and various efficient implementations of MRF have been presented |10[ I13 [ H5]. 
However, as in the case of MRP, implementing MRF is designing an algorithm for an NP-hard 
optimization problem [12], namely Minimum-Flip Supertree. The aim of the present paper 
is to study the approximability and the fixed-parameter tractability [T7] of Minimum-Flip 
Supertree. We prove strongly negative results. 

2 Preliminaries 

For each finite set X, the cardinality of X is denoted \X\. The ring of integers is denoted Z. 
Define Z = ZU {— oo, +oo}. 
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2.1 Rooted phylogenies 

Let 5 be a finite set. A (rooted) phylogeny for S is a subset T of the power set of S that 
satisfies the following properties: G T, S G T, {s} G T for all s G S, and X n Y G {0, X, Y } 
for all X, Y G T. The elements of 5 are the leaves of T. The elements of T are the clusters 
of T. The most natural representation of T is, of course, a rooted graph-theoretic tree with 
\T\ — 1 nodes (the empty cluster does not correspond to any vertex). 

Given two phylogenies T% and T2 for S, T\ is a subset of T2 if, and only if, the graph 
representation of T\ can be obtained from the graph representation of T2 by contracting 
(internal) edges. If T\ is a subset of T2 and if we assume that hard polytomies never occur 
then T2 is at least as informative as T\. 

2.2 Bipartite graphs and perfect phylogenies 

A bipartite graph is a triple G = (C, S, E), where C and S are two finite sets and E is a subset 
of C x S. The elements of E are the edges of G. The elements of (C x S)\E are the non-edges 
of G. For each c G C, Nq(c) denotes the neighborhood of c in G: Ng(c) = {s G S : (c, s) G 

Let M(G) denote the set of all quintuples (s, c, s', c', s") eS'xCxSxCxS such that 
(c,s) G £, (c,s') G £?, (c',s') G E, {c',s") G (c, s") £ E, and (c',s) £ £. The latter 
conditions state that the bipartite graph depicted in [251 Figure 4] is an induced subgraph 
of G. A perfect phylogeny for G is a phylogeny T for S such that Ng{c) is a cluster of T for 
every c G C. We say that G is M-/ree 0QIHI31E2] (or T-free) 01E1I25]) if the following three 
equivalent conditions are met: 

1. for all c, d G C, iV G (c) n N G {d) G {0, iV G (c), iV G (c')}, 

2. M(G) is empty, and 

3. there is a perfect phylogeny for G. 

Put T G = {0, S}U{N G (c) : c G C}U{{s} : s G 5}. If G is M-free then T G satisfies the following 
two properties: 

1. Tq is a perfect phylogeny for G. 

2. T G is a subset of any perfect phylogeny for G. 

Modelization. In our model, S is a set of species (or more generally taxa) and C is a set of 
binary characters. For each (c, s) G C x 5, (c, s) £ E means that species s possesses character 
c and (c, s) ^ E means that species s does not possess character c. Character data come from 
the morphological and/or molecular properties of the taxa [21]. The assumption of the model 
is that for all c G C and all s, s' G S, the following two assertions are equivalent: 

1. Both species s and s' possess character c. 

2. Some common ancestor of species s and s' possesses character c. 

A phylogeny for S satisfies the assumption of the model if, and only if, it is a perfect phylogeny 
for G. 

2.3 Dealing with incomplete and/or erroneous data sets 

A bipartite draft-graph (or weighted bipartite fuzzy graph [6]) is a triple H = (C,S,F), where 
C and S are two finite sets and F is a function from C x S to Z. The function i 7 is the weight 
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function of H. The range of F is called the weight range of H. An edge of H is an element 
e € C x S such that .F(e) > 1. A joker-edge of if is an element e 6 C x S such that f\e) = 0. 
A non-edge of if is an element eeCxS such that F(e) < — 1. 

For each e 6 C x S 1 , the magnitude of -F(e) is the edit cost of e in 77. An edition of if is a 
bipartite graph G of the form G = (C, S, E) for some subset E C C x 5. A conflict between 
G and if is an element e € C x 5 that satisfies one of the following two conditions: 

1. e is an edge of G and e is a non-edge of if or 

2. e is a non-edge of G and e is an edge of if. 

The sum of the edit costs in if over all conflicts between G and H is denoted A(G,H): 
A(G,H) = ^max{0,-F(e)} + ^ max{0,F(e)}. 

eS-B e£E\(CxS) 

The following minimization problem and its (parameterized) decision version generalize 
several previously studied problems: 

Name: Minimum M-Free Edition or Min Edit. 
Input: A bipartite draft-graph H. 
Solution: An M-free edition G of H. 
Measure: A(G,H). 

Name: M-free Edition or Edit. 

Input: A bipartite draft-graph H and an integer k > 0. 

Question: Is there an M-free edition G of H such that A(G, H) < k? 

Parameter: k. 

For each subset X C Z, define Min Edit-A as the restriction of Min Edit to those 
bipartite draft-graphs whose weight ranges are subsets of X, and similarly, define Edit-X as 
the restriction of Edit to those instances (H , k) such that the weight range of H is a subset of 
X. Notably, Min Edit-{— 1, +1} is the Minimum-Flip Supertree problem and its restiction 
Min Edit-{-1,0,+1} is the Minimum-Flip Consensus Tree problem [¥|[5|[9UT3 l fT6 | [22] . 

Modelization. Incomplete and/or possibly erroneous character data sets are naturally mod- 
eled by bipartite draft-graphs: joker-edges represent incompletenesses and edit costs allow 
parsimonious error-corrections. 

Supertrees. The most interesting feature of MlN Edit is that it can be thought as a supertree 
construction problem, and more precisely, the optimization problem underlying MRF [HO 

□HE2I. 

3 Previous results 

Min Edit- A has been studied for several subsets X C Z gl 0121111311211131 113 EJBE21I23, 
sometimes implicitely. Let H = (C, S, F) be a bipartite draft-graph and let k be a non- 
negative integer. 

Put Z_|_ = {n € Z : n > 0} and Z_ = {n G Z : n < 0}. If if has no non-edge, or equiv- 
alently, if the weight range of if is a subset of Z + then the complete bipartite graph 
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K = (C,S,C x S) is an M-free edition of H such that A(K,H) = 0. In the same way, if 
H has no edge then the empty bipartite graph K = (C, S, 0) is an M-free edition of H such 
that A{K , H) = 0. Hence, MiN Edit-Z + and MlN Edit-Z_ are trivial problems. 

Now, consider the case where the weight range of H is a subset of {— oo,+oo}. The 
bipartite graph 

G = (C, S, {e g C x S : F{e) = +oo}) 

is an edition of H such that A(G, H) = 0; for every edition G' of H, G' ^ G implies 
A(G',H) = +oo. Therefore, solving Min Edit on H reduces to deciding whether G is M- 
free, which can be achieved in 0(\C\ \S\) time [20112I]- Hence, MlN Edit-{— oo, +00} can 
be solved in polynomial time because it reduces to the recognition problem associated with 
the class of M-free bipartite graphs. More generally, Min Edit-{— 00, 0, +00} can also be 
solved in polynomial time because it reduces to the sandwich problem [19] associated with 
the class of M-free bipartite graphs: in the case where the the weight range of H is a subset 
of {— 00, 0, +00}, Min Edit can be solved on H in 0(\C\ \S\) time [25] . 

Put I = {-l,+oo}, D = {-00, +1}, and U = {-1,+1}. Min Edit-/, Min Edit-D, and 
Min Edit-[7 (also known as Minimum-Flip Consensus Tree) are the three unweighted 
edge-modification problems [24J associated with the class of M-free bipartite graphs: Min 
Edit-/ is the insertion (or completion) problem and Min Edit-D is the deletion problem. 
Edit-/, Edit-/}, and Edit-{7 are NP-complete [12] . 

Put Z* = Z \ {0}. Min Edit-Z* is the restriction of Min Edit to those bipartite draft- 
graphs that have no joker-edge. The most positive result concerning Min Edit is that Edit-Z 
is FPT: in the case where the weight range of H is a subset of Z , deciding whether (//, k) is a 
yes-instance of Edit (and if so, computing an M-free edition G of H such that A(G, H) < k) 
can be achieved in 0(6 k \C\ \S\) time [12]. Better FPT algorithms have been presented for 
the special cases Min Edit-/ [12], Min Edit-/) [12], and Min Edit-C/ [3[22]. In particular, 
Edit-?/ has a polynomial kernel [22] . 

Exact algorithms based Integer Linear Programming [13], as well as heuristics [1U]I16|. 
have been tested for Min Edit-{— 1, 0, +1} (also known as Minimum-Flip Supertree). 

4 Contribution 

The aim of the present paper is to complete the study of Min Edit by proving: 

Theorem 1. For all a, f3 £ Z such that —a < < f3 and (a,/3) ^ (+00, +00), the following 
two statements hold: 

1. EDiT-{-a,0, /?} is W[2]-hard and 

2. if there exists a real constant p>l such that Min Edit-{— a, 0, /?} is p-approximable in 
polynomial time then P = NP. 

The intractabilities of Min Edit-{— 1, 0, +00}, Min Edit-{-oo, 0, +1}, and Min Edit- 
{-1,0, +1} (also known as Minimum-Flip Supertree) follow from Theorem [TJ 

Our proof of Theorem [1] requires the introduction of some material and results from the 
literature [8]. For all x, y, z, let (x, y\z) denote the unique phylogeny for {x, y, z} having {x, y} 
as a cluster: 

(x,y\z) = {$,{x},{y},{z},{x,y},{x,y,z}} . 
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A resolved triplet is a phylogeny of the form (x,y\z) for some pairwise distinct x, y, z. Given 
a phylogeny T for some superset of {x, y, z}, we say that (x, y\z) fits T if there exists a cluster 
X of T such that X n {x, y, z} = {x, y}. 

Name: Minimum Resolved Triplets Inconsistency or Min RTI. 
Input: A finite set S and a set TZ of resolved triplets with leaves in S. 
Solution: A phylogeny T for S. 

Measure: The number of those elements of TZ that do not fit T. 
Name: Resolved Triplets Inconsistency or RTI. 

Input: A finite set S, a set TZ of resolved triplets with leaves in S, and an integer k > 0. 
Question: Is there a phylogeny T for S such that at most k elements of TZ do not fit T? 
Parameter: k. 

Theorem 2 (Byrka, Guillemot, and Jansson 2010 |8j). 

1. RTI is W[2]-hard. 

2. If there exists a real constant p > 1 such that Min RTI is p-approximable in polynomial 
time then P = NP. 

The idea behind the proof of Theorem Q] is the following: given an instance (S, TZ) of 
Min RTI, computing a "good" solution of MlN RTI on (5, TZ) is computing a "good" MRF 
supertree for the phylogenies in TZ. 

Proof (Proof of Theorem{I\ 1 ) . Theorem [TJ 1 is deduced from Theorem [2jl: we show that RTI 
FPT-reduces to Edit-{— a, 0, [3}. Put 7 = min{a,/3}. Note that 7 is a positive integer. 

Let (S, TZ, k) be an arbitrary instance of RTI. The reduction maps (S, TZ, k) to an instance 
(H,jk) of Edit-{-q,0,/3}, where H is as follows. Let C = {1, 2, . . . , \K\\. Write TZ in the 
form 

TZ = {(x c ,y c \z c ) : c € C} . 



Let F be the function from C x S to Z given by: 

F(c,*) = 



/? if s e {x c , y c ) 
if s {x c , y c , z c } 
—a if s = z c 



for all (c, s) € C x S. Let H = (C, S, F). 

Clearly (H, jk) is computable from (S, TZ, k) in polynomial time. It remains to prove that 
(S, TZ, k) is a yes-instance of RTI if, and only if, {H, -fk) is a yes-instance of Edit. 



If. Assume that {H, 7/c) is a yes-instance of Edit. Then, there exists an M-free edition G of 
H such that A(G,H) < 7/c. Let C denote the set of all c € C such that (c,s) is a conflict 
between G and i7 for at least one s E {x c , y c , z c }. Since there are at least \C'\ conflicts between 
G and H, we have 7 \C'\ < A{G, H), and thus \C'\ < Let T be a perfect phylogeny for G. 
For each c € C \ C", we have Nq(c) n {x c , y c , z c } = {x c , y c }, and thus (x c , y c \z c ) fits T. Hence, 
T is a phylogeny for S such that at most k elements of TZ do not fit T. Therefore, (S, TZ, k) is 
a yes-instance of RTI. 
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Only if. Assume that (S, 1Z, k) is a yes-instance of RTI. Then, there exists a phylogeny T 
for S such that at most k elements of 1Z do not fit T. Let C denote the set of all c G C 
such that {x c ,y c \z c ) does not fit T. For each c G C \ C , let X c be a cluster of T such that 
X c n {x c ,y c ,z c } = {x c ,y c }. If a < /3 then let A c = S for each c G C"; if /? < a then let 
A c = {x c } for each c G C . Put G = (C, 5, LUcW x 

1. G is an edition of H . 

2. T is a perfect phylogeny for G because Ng{c) = X c is a cluster of T for all c G C. 
Therefore, G is M-free. 

3. Let r denote the set of all conflicts between G and H. If a < (3 then r = {(c, z c ) : c G C"}; 
if /3 < a then r = {(c,y c ) : c G C'}. The edit cost in H of every conflict between G and 
-ff equals 7. Therefore, we have A(G,H) = 7 |.T| = 7 |C"| < 7/c. 

Hence, (i?, 7/c) is a yes-instance of Edit. □ 

Proof (Proof of Theorem [IJ 2.). Let p be real number greater than or equal to 1. It follows 
from the proof of Theorem [TJ 1 that if G is a /^-approximate solution of Min Edit on H then 
any perfect phylogeny for G is a p-approximate solution of Min RTI on (S,1Z). Therefore, 
if Min Edit is p-approximable in polynomial time then Min RTI is also p-approximable in 
polynomial time. It is now clear that Theorem [TJ2 follows from Theorem [212. □ 

5 Conclusion 

To conclude, let us contrast Theorem [1] with two recent results. 

The Maximum Parsimony (MP) problem [2] is the NP-hard optimization problem p^lfl8] 
underlying MRP, as Min Edit is the optimization problem underlying MRF. Although Min 
Edit is NP-hard to approximate within any constant factor by Theorem [TJ2, MP is 1.55- 
approximable in polynomial time [2]. 

The parameterized problems Edit and Weighted Fuzzy Cluster Editing (WFCE) 
[S] are closely related: WFCE is the draft-graph edition problem corresponding to the class 
of P3-free graphs. Edit is W[2]-hard by Theorem [TJ 1 but WFCE has been recently shown to 
be fixed-parameter tractable [25] (see also [TUTS]). 

References 

1. A. V. Aho, Y. Sagiv, T. G. Szymanski, and J. D. Ullman. Inferring a tree from lowest common ancestors 
with an application to the optimization of relational expressions. SIAM J. Comput., 10(3):405-421, 1981. 

2. N. Alon, B. Chor, F. Pardi, and A. Rapoport. Approximate maximum parsimony and ancestral maximum 
likelihood. IEEE/ACM Trans. Comput. Biology Bioinform., 7(1):183-187, 2010. 

3. B. R. Baum. Combining trees as a way of combining data sets for phylogenetic inference, and the desir- 
ability of combining gene trees. Taxon, 41(1):3-10, 1992. 

4. O. R. P. Bininda-Emonds, editor. Phylogenetic Supertrees: Combining Information to Reveal the Tree of 
Life, volume 4 of Computational Biology Series. Kluwer Academic, 2004. 

5. S. Bocker, Q. B. A. Bui, and A. Truss. An improved fixed-parameter algorithm for minimum-flip consensus 
trees. In M. Grohe and R. Niedermeier, editors, Proc. of International Workshop on Parameterized and 
Exact Computation (IWPEC 2008), volume 5018 of Led. Notes Comput. Sci., pages 43-54. Springer- 
Verlag, 2008. 

6. H. L. Bodlaender, M. R. Fellows, P. Heggernes, F. Mancini, C. Papadopoulos, and F. Rosamond. Clustering 
with partial information. Theor. Comput. Sci., 411(7-9):1202-1211, 2010. 

7. N. Bousquet, J. Daligault, and S. Thomasse. Multicut is FPT. In Proc. of ACM Symposium on Theory 
of Computing (STOC 2011), pages 459-468. ACM, 2011. 



Intractability of the Minimum-Flip Supertree problem and its variants 



7 



8. J. Byrka, S. Guillemot, and J. Jansson. New results on optimizing rooted triplets consistency. Discrete 
Appl. Math., 158(11):1136-1147, 2010. 

9. D. Chen, L. Diao, O. Eulcnstcin, D. Fernandez-Baca, and M. J. Sanderson. Flipping: A supertree con- 
struction method. In M. F. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts, 
editors, Bioconsensus: DIM ACS Working Group Meetings on Bioconsensus, volume 61 of DIM ACS Series 
in Discrete Mathematics and Theoretical Computer Science, pages 135-160. Amer. Math. Soc, 2003. 

10. D. Chen, O. Eulenstein, D. Fernandez-Baca, and J. G. Burleigh. Improved heuristics for minimum-flip 
supertree construction. Evol. Bioinform. Online, 2:347-356, 2006. 

11. D. Chen, O. Eulenstein, D. Fernandez-Baca, and M. Sanderson. Supertrees by flipping. In Proc. of 
Conference on Computing and Combinatorics (COCOON 2002), volume 2387 of Led. Notes Comput. 
Set., pages 391-400. Springer- Verlag, 2002. 

12. D. Chen, O. Eulenstein, D. Fernandez-Baca, and M. Sanderson. Minimum-flip supertrees: Complexity and 
algorithms. IEEE/ACM Trans. Comput. Biology Bioinform., 3(2):165-173, 2006. 

13. M. Chimani, S. Rahmann, and S. Bocker. Exact ILP solutions for phylogenetic minimum flip problems. 
In Proc. of ACM Conf. on Bioinformatics and Computational Biology (ACM-BCB 2010), pages 147-153. 
ACM, 2010. 

14. W. Day, D. Johnson, and D. Sankoff. The computational complexity of inferring rooted phylogenies by 
parsimony. Math. BioscL, 81(l):33-42, 1986. 

15. E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. 
Theor. Comput. Set., 361(2-3):172-187, 2006. 

16. O. Eulenstein, D. Chen, J. G. Burleigh, D. Fernandez-Baca, and M. J. Sanderson. Performance of flip 
supertree construction with a heuristic algorithm. Syst. Biol, 53(2):299-308, 2004. 

17. J. Flum and M. Grohe. Parameterized Complexity Theory. Springer- Verlag, 2006. 

18. L. Foulds and R. L. Graham. The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math., 
3(l):43-49, 1982. 

19. M. C. Golumbic, H. Kaplan, and R. Shamir. Graph sandwich problems. J. Algorithms, 19(3):449-473, 
1995. 

20. D. Gusfield. Efficient algorithms for inferring evolutionary trees. Networks, 21(l):19-28, 1991. 

21. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. 
Cambridge University Press, 1997. 

22. C. Komusiewicz and J. Uhlmann. A cubic- vertex kernel for flip consensus tree. In Proc. of Foundations of 
Software Technology and Theoretical Computer Science (FSTTCS 2008), volume 2 of Leibniz International 
Proceedings in Informatics, pages 280-291. Dagstuhl, 2008. 

23. D. Marx and I. Razgon. Fixed-parameter tractability of multicut parameterized by the size of the cutset. 
In Proc. of ACM Symposium on Theory of Computing (STOC 2011), pages 469-478. ACM, 2011. 

24. A. Natanzon, R. Shamir, and R. Sharan. Complexity classification of some edge modification problems. 
Discrete Appl. Math., 113(1):109-128, 2001. 

25. I. Pe'er, T. Pupko, R. Shamir, and R. Sharan. Incomplete directed perfect phylogeny. SIAM J. Comput., 
33(3):590-607, 2004. 

26. M. A. Ragan. Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol., 
l(l):53-58, 1992. 



